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Abstract — We study the profit maximization problem of a cognitive virtual network operator in a dynamic network environment. We 
consider a downlink OFDM communication system with various network dynamics, including dynamic user demands, uncertain sensing 
spectrum resources, dynamic spectrum prices, and time-varying channel conditions. In addition, heterogenous users and imperfect 
sensing technology are incorporated to make the network model more realistic. By exploring the special structural of the problem, 
we develop a low-complexity on-line control policies that determine pricing and resource scheduling without knowing the statistics of 
dynamic network parameters. We show that the proposed algorithms can achieve arbitrarily close to the optimal profit with a proper 
trade-off with the queuing delay. 

Index Terms — Cognitive Radio, Profit Maximization, Pricing, Virtual Network Operator. 



1 Introduction 

The limited wireless spectrum is becoming a bottleneck for 
meeting today's fast growing demands for wireless data ser- 
vices. More specifically, there is very little spectrum left that 
can be licensed to new wireless services and applications. 
However, extensive field measurements (2) showed that much 
of the licensed spectrum remains idle most of the time, even 
in densely populated metropolitan areas such as New York 
City and Chicago. A potential way to solve this dilemma is to 
manage and utilize the licensed spectrum resource in a more 
efficient way. 

This is why the concept of Dynamic Spectrum Access 
(DSA) has received enthusiastic support from governments 
and industries worldwide |3]-[|5]. We can roughly classify 
various DSA approaches into two main categories: the spec- 
trum sensing based ones and the spectrum leasing (or market) 
based ones. The first category indicates a hierarchical access 
model, where unlicensed secondary users opportunistically 
access the under-utilized part of the licensed spectrum, with 
controlled interference to the licensed primary users. During 
this process, spectrum sensing helps the secondary users to 
detect the currently available spectrum resource. In contrast, 
the second category relates to a dynamic exclusive use model, 
which allows licensees to trade spectrum usage right to the 
secondary users. In both categories, it is possible to have a 



• Shuqin Li is with Research and Innovation, Alcatel-Lucent Shanghai Bell 
Co., Ltd., D400, Bldg. 3, 388 Ningqiao Rd, Shanghai, 201206, China. E- 
mail: Shuqin.Li@alcatel-sbell.com.cn This work was done when Shuqin Li 
was in The Chinese University of Hong Kong. 

• Jianwei Huang is with Department of Information Engineering, The Chi- 
nese University of Hong Kong. E-mail: jwhuang@ie.cuhk.edu.hk. Jianwei 
Huang is the corresponding author. 

• Shuo-Yen Robert Li is with both Department of Information Engineering 
and Institute of Network Coding, The Chinese University of Hong Kong. 
E-mail: bobli@ie. cuhk. edu. hk. 

Part of the results have appeared in IEEE ICC 2012 ^7J. This work is sup- 
ported by the General Research Funds (Project Number CUHK 412710 and 
CUHK 412511) and AoE established under the University Grant Committee 
of the Hong Kong Special Administrative Region, China, and grants from 
China 973 Prog. No.2012CB315901 & 2012CB315904. 



secondary operator coordinating the transmissions of multiple 
secondary users. 

There are pros and cons for both DSA categories. Spectrum 
sensing detects and identifies the available unused licensed 
spectrum through technologies such as beacons, geolocation 
system, and cognitive radio. Form the secondary operator's 
perspective, the spectrum acquired by sensing is an unreliable 
resource, since it cannot determine how much resource is 
available before sensing. Furthermore, imperfect sensing may 
lead to collisions with primary users, and thus reduce the 
incentives for the licensee to share the spectrum. Therefore 
the secondary operator needs to carefully design sensing and 
access algorithm to control the collision probability under an 
acceptable level. In dynamic spectrum leasing, a secondary 
operator acquires the exclusive right to use spectrum within 
a limited time period by paying the corresponding leasing 
price. Thus the spectrum acquired by spectrum leasing is a 
reliable resource. However, the cost can be high compared 
to the spectrum sensing cost, and is dynamically changing 
according to the demand and supply relationship in the market. 

In this paper, we will consider a hybrid model, where a 
secondary operator obtains resources from the primary li- 
censees through both spectrum sensing and dynamic spectrum 
leasing, and provides services to the secondary unlicensed 
users. Our study is motivated by (6], Q, in which the 
authors introduced the new concept of Cognitive Mobile 
Virtual Network Operator (C-MVNO). The C-MVNO is a 
generalization of the existing business model of MVNO J8], 
which refers to the network operator who does not own a 
licensed frequency spectrum or even wireless infrastructure, 
but resells wireless services under its own brand name. The 
MVNO business model has been very successful after more 
than 10 years' development, and there are more than 600 
MVNOs today |9l. ifTOll. The C-MVNO model generalizes the 
MVNO model with DSA technologies, which allow the virtual 
operator to obtain spectrum resources through both spectrum 
sensing and leasing. The C-MVNO model can be applied to 
a wild range of wireless scenarios. One example is the IEEE 
802.22 standard ifTTl . which suggests that the cognitive radio 
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network using white space in TV spectrum will operate on 
a point to multipoint basis (i.e., a base station to customer- 
premises equipments). Such a secondary base station can be 
operated by a C-MVNO. 

The key difference between our work and the ones in 
J6), is that we study a much more realistic dynamic 
network in this paper. In J6], Q, the authors formulated the 
problem based on a static network scenario, and provided 
interesting equilibrium results through a one-shot Stackelberg 
game. However, the real network is highly dynamic. For 
example, users arrive and leave the systems randomly, the 
statistics of spectrum availability changes over time, and the 
spectrum-sensing results are imperfect. Also the leasing price 
is often unpredictable and changing from time to time. These 
dynamics and realistic concerns make the network model and 
the corresponding analysis rather challenging. 

In this paper, we focus on the profit maximization problem 
for C-MVNO in a dynamic network scenario. Our key results 
and contributions are summarized as follows. 

• A dynamic network decision model: Our model incor- 
porates various key dynamic aspects of a cognitive radio 
network and the dynamic decision process of a C-MVNO. 
We model sensing channel availability, leasing market 
price, and channel conditions as exogenous stochastic 
processes. 

• Dynamic user demands: We allow users to dynamically 
join the network with random demands (file sizes). The 
demand is affected by both the transmission prices (deci- 
sion variables) and market states (exogenous stochastics). 

• Realistic cognitive radio model: We incorporate various 
practical issues such as imperfect spectrum sensing, pri- 
mary users' collision tolerance, and sensing technology 
selection. The operator needs to choose a sensing tech- 
nology to trade-off between cost and performance. 

• A low-complexity on-line control policy: By exploiting 
the special structure of the problem, we design a low- 
complexity on-line pricing and resource allocation policy, 
which can achieve arbitrarily close to the operator's 
optimal profit. The policy does not require precise in- 
formation of the dynamic network parameters, has a low 
system overhead, and is easy to implement. 

The remainder of the paper is organized as follows. In 
Section II, we introduce the related work. In Section [3] we 
introduce the system model. Section |4] describes the problem 
formulation. In Section [5J we propose the profit maximization 
control (PMC) policy for homogeneous users and analyze its 
performance. We further extend profit maximization control 
policy (M-PMC policy) to heterogeneous users in Section [6] 
Section [7] provides simulation results for both PMC and M- 
PMC polices. Finally, we conclude the paper in Section [8] 

2 Related Work 

Among the vast literature on cognitive radio, we will focus 
on the results on operator-oriented cognitive radio networks, 
where secondary operators play key roles in terms of coordi- 
nating the transmissions of the secondary users. These studies 
only started to emerge recently, e.g., (0, Q, ifPZl - lETI . We 



can further classify these studies into two clusters: monopoly 
models with one operator, and oligopoly models with multiple 
operators. 

References J6), Q, 0~2), lfl~3l studied monopoly models 
using the Stackelberg game formulation. Daoud et al. in |[T2l 
proposed a profit-maximizing pricing strategy for uplink power 
control problem in wide-band cognitive radio networks. Yu et 
al. in lfl3l proposed a pricing scheme that can guarantee a 
fair and efficient power allocation among the secondary users. 

References lfl4l - ll2Tl looked at the oligopoly issues, either 
between two operators |[T4l . 031 or among many operators 
lfT6l - l2T1l . For the case of two operators, Jia and Zhang in 
f\M proposed a non-cooperative two-stage game model to 
study the duopoly competition. Duan et al. in lfT31 formulated 
the economic interaction among the spectrum owner, two 
secondary operators and the users as a three-stage game. For 
the case of many operators, Deri et al. in lfl6l developed a 
non-cooperative game to model competition of operators in a 
mixed commons/property-rights regime under the regulation of 
a spectrum policy server. Elias and Martignon in Wf\ showed 
that polynomial pricing functions lead to unique and efficient 
Nash equilibrium for the two-stage Stackelberg game between 
network operators and secondary users. Niyato et al. in [fi"8l 
formulated an evolutionary game for modeling the dynamics of 
a multiple-seller, multiple-buyer spectrum trading market. In 
addition, several auction mechanisms were proposed to study 
the investment problems of cognitive network operators (e.g., 

tm-EQ). 

All results mentioned above considered a rather static 
network model. In contrast, our work adopts a dynamic 
network model to characterize the stochastic nature of wireless 
networks. We will focus on a monopoly model in this paper. 

In this paper, we use Lyapunov stochastic optimization 
to show the optimality and stability of the proposed profit 
maximizing control algorithms. Several closely related pre- 
vious results applying Lyaunov stochastic optimization to 
wireless networks include Il22l - ll24l . Huang and Neely in Ell 
considered revenue maximization problem for a conventional 
wireless access point without considering the cognitive radio 
technologies. Urgaonkar and Neely in ll23l and Lotfinezhad 
et al. in l24l studied cognitive radio networks based on 
a user-oriented approach, by designing joint scheduling and 
resource allocation algorithms to maximize the utility of a 
group of secondary users. Our paper focused on an operator- 
oriented approach to address profit maximization problem. In 
particular, we need to deal with the combinatorial problem of 
channel selection and channel assignment that usually leads to 
a high computational complexity. By discovering and utilizing 
the special problem structure, we design a low-complexity 
algorithm that is suitable for online implementation. 

3 System Model 

Consider a C-MVNO that provides wireless communications 
services to its own secondary users by acquiring spectrum 
resource from some spectrum owner. For example, Google 
may acquire spectrum from AT&T to provide its own wireless 
services through the C-MVNO model. The spectrum owner's 
spectrum can be divided into two types: the sensing band and 
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the leasing band. In the sensing band, AT&T serves its own 
primary users, but allows Google to identify available spec- 
trum in this band through spectrum sensing without explicit 
communications with AT&T. In the leasing band, AT&T will 
does not allow spectrum sensing, and will lease the band to 
Google for economic returns. 

More specifically, we consider a time-slotted OFDM sys- 
tem, where the C-MVNO serves the downlink transmissions 
from its base station to the secondary users. The system model 
is illustrated in Fig. [TJ Secondary users randomly arrive at 
the secondary network and request files with random sizes 
to be downloaded from the base station. This requested files 
are queued at the server in the base station until they are 
successfully transmitted to the requesting users. 




Sensing Band (with collision bound ) Leasing Band (with dynamic 

market price) 

Q Spectrum hole Q Unavailble band 



Fig. 1 . Business model of the operator (Cognitive Virtual 
Network Operator). 

The rest of the section introduces each part of the system 
model in more details. The C-MVNO (or "operator" for 
simplicity) obtains wireless channels through spectrum sensing 
(Sections 13. II and [3. 21 ) and spectrum leasing (Section [3. 31 . and 
allocates power over the obtained channels (Section [3.4l i. Sec- 
ondary users dynamically arrive and request file downloading 
services (based on the demand model in Section [3~5l l. and we 
model the requests as a queue (Section [3.6b . 

3.1 Imperfect Spectrum Sensing 

Sensing band Bi^ v = fl, . . . , -Bf,™} includes all channels 
that the spectrum owner allows sensing by the operator^ We 
define the state of a channel % G <6max(*) in time slot t as 
Si(t), which equals if channel i is busy (being used by a 
primary user), and equals 1 if channel i is idle. 

We assume that S%(t) is an i.i.d. Bernoulli random variable, 
with an idle probability po G (0, 1) and a busy probability 
1 — po- This approximates the reality well if the time slots for 
secondary transmissions are sufficiently long or the primary 
transmissions are highly bursty 0271 . (We will further study 

1 . The operator will collect the sensing information from a sensor network 
or geolocation database and provide it to its users, i.e., providing "sensing 
as service" 1251 . 1261 . This means that the network can accommodate 
legacy mobile devices without cognitive radio capabilities. For more detailed 
discussions, see (7). 



the general Markovian model in Section 15.51 ) We define the 
sensing state of a channel i G #f nax in time slot t as Wi(t), 

which equals to if channel i is sensed busy, and 1 if sensed 
idle. 

Notice that Wi(t) may not equal to Si(t) due to imperfect 
sensing. The accuracy of spectrum sensing depends on the 
sensing technology l28l . If we denote C s as the sensing cost 
(per channeljl, then we can write the false alarm probability as 
P fa (C s ) = Pr{W l = 0\Si = 1} (same for all channel i) and 
the missed detection probability as P m d{C s ) = Pr{Wi = 
l\Si = 0} (same for all channel i). Both functions are 
decreasing in C s . Intuitively, a better technology will have 
a higher cost C s , a lower false alarm probability Pf a (C s ), 
and a lower missed detection probability P m d(C s ). We denote 
all choices of cost C s (and thus the corresponding sensing 
technologies) by a finite set C s . 

As different channels have different conditions (to be ex- 
plained in details in Section [3~4l i. the operator needs to decide 
which channels to sense at the beginning of each time slot. 
We use B s (t) to denote the set of channels sensed by the 
operator at time t, which satisfies 

B s (t)cB s max yt. (1) 

3.2 Collision Constraint 

Missed detections in spectrum sensing lead to transmission 
collisions with the primary users. We denote the collision in 
channel i G £>* iax at time f as a binary random variable 
Xi(t) G {0,1}. We have X^t) = (1 - Si(t))Wi(t), i.e., the 
collision happens if and only if the channel is busy but is 
sensed idle. 

To protect primary users' transmissions, the operator needs 
to ensure that the average collision in each channel i does not 
exceed a tolerable level rji (measured in terms of the average 
number of collisions per unit time) specified by the spectrum 
owner. The tolerable level r\i can be channel specific, since the 
primary users in different channels may have different QoS 
requirements. We define the time-average number of collision 

in channel i as Xi = lim^oo \ Yjt=o ^ [-^i( r )]- The collision 
constraints are 

Xl< m ,\li&B a max {t). (2) 

3.3 Spectrum Leasing with a Dynamic Market Price 

A spectrum owner may have some channels that do not want to 
be sensed, for either privacy reasons or the fear of collisions 
due to sensing errors. However, these channels may not be 
always fully utilized. The spectrum owner can lease the unused 
part of these channels to the operator dynamically over time 
to earn more revenue. Recall that we denote the set of these 
channels as the leasing band B l max = {1, . . . , £?' nax }. (In 
general, we may represent it as B max (t), since our model 
allows leasing band to be time-varying. For the simplicity of 
notations, we denote it as B l mliX whenever it is clear.) We use 
B\ (t) to denote the set of channels leased by the operator 
at time i, which satisfies 

B l (t)CB l max ,Vt. (3) 

2. The cost con'esponds to, for example, power or time used for sensing. 
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These channels will be exclusively used by the operator in the 
current time slot. We denote the leasing price per channel as 
C l (t), which stochastically changes according to the supply 
and demand relationship in the spectrum market (which might 
involve many spectrum owners and operators). It can be 
modeled by an exogenous (not affected by this particular 
operator's decisions) random process with countable discrete 
states and stationary distribution (not necessarily known by 
the operator). 

3.4 Power Allocation 

In wireless network, there are usually channel fading due 
to multipath propagation or shadowing from obstacles. To 
combat channel fading, it is necessary for the operator to do 
proper power allocation in both sensing channels and leasing 
channels to achieve satisfactory data rates. For each channel 

i S Bmax = £>max u ^Lax> ^»(*) represents its channel gain 
in time slot t and follows an i.i.d. distribution over time. 
Different channels have independent and possibly different 
channel gain distributions. We assume that secondary users 
are homogeneous and experience the same channel condition 
for the same channel. But channel conditions can be different 
in different channels]^ (The heterogenous user scenario will 
be further discussed in Section [6]) The operator can measure 
hi(t) for each i at the beginning of each slot t, but may not 
know the distributions. Let Pi(t) denote the power allocated 
to channel i at time t. Since we consider a downlink case 
here, the operator needs to satisfy the total power constraint 
Pmax at its base station, 

p i(t) < Pm^yt. (4) 

In addition, for a channel i G £>f nax in the sensing band, 
we use the binary variable Ii{t) = Si(t)Wi(t) to denote the 
transmission result of a secondary user, i.e., Ii(t) = 1 if 
successful {i.e., S t (t) = 1 and W l {t) = 1) and l^t) = 
otherwise (either not sensed, or sensed busy, or sensed idle 
but actually busy). Based on the discussion of the leasing 
agreement, we have Ii(t) = 1, i G £> max for all channel in 
leasing band. Then the rate in channel i at time slot t is (based 
on the Shannon formula) 

r i {t)=I i (t)log 2 (l + h i (t)P i (t)), (5) 

and total transmission rate obtained by the operator is 

r(t)= e < 6 > 
teB° (t)nB l (t) 

Furthermore, we assume that the operator has a finite 
maximum transmission rate, i.e., r(t) < r max ,Vt, under any 
feasible power allocation. 

3.5 Demand Model 

We will focus on elastic data traffic in this paper. Secondary 
users randomly arrive at the network to request files with 
random and finite file sizes (measured in the number of 

3. This is the case where the users are located close by, and thus the 
downlink channel condition from the base station to the users is user 
independent. 



packets) from the operator. A user will leave the network 
once it has downloaded the complete requested file. The 
operator can price the packet transmission dynamically over 
time, which will affect the users' arrival rate. For example, a 
higher price at peak time can refrain users from downloading 
files, as they can wait until a later time with a lower price. 
To model this, we use M(t) to denote the random market 
state, which can be measured precisely at the beginning of 
each time slot t and can help estimate the users demancQ. The 
random variable is drawn from a finite set M. over time in an 
i.i.d. fashion. The distribution of M(t) may not be known by 
the operator. 

At a time t, the operator will decide whether to accept new 
file downloading requests from newly arrived secondary users. 
We define the binary demand control variable as 0(t), 
where 0(t) = 1 means that the operator accepts the incoming 
requests in time t, and 0(t) = otherwise. When the operator 
decides to accept new requests of packet transmissions, it will 
also announce a price q(t) for transmitting one packet 
(to any user). This price will affect the users' incentives of 
downloading requests, e.g., when price q(t) is high, some users 
may choose to postpone their requests. 

More precisely, we denote the number of incoming users at 

time t as a discrete random variable N(t) = N (M(t), q(t)) G 
{0,1,2...}, the distribution of which is a function of the 
transmission price q(t) and market state M(t). Further, a 
user n's requested file size is denoted L n (t), with n G 
{1,2,..., N(q(t))}, which is assumed to be independent of 
each other and does not depend on q{t) or M(t). Moreover, 
we assume that users are using a set K, = {1,2,..., K} of 
different applications, and denote 6k as the probability that an 
incoming user is using application fc G K, with X)tLi ^fc = !■ 
The distributions of the file length for different applications 
can be different, and we denote l k as the expected file length 
of application k G /C. 

To summarize, users' instantaneous demand at time t is 

N(M(t),q(t)) 

A{t)t J2 M*), ( 7 ) 
n=l 

which is a random variable due to random file sizes and 
the random number of incoming users (even given q(t) and 
M{t)). We define the users' (expected) demand function 
as D(t) = D(M(t),q(t)) = E [A (M(t),q(t))], and its 
value is completely determined by M(t) and q(t). We can 
calculate that D(M(t), q(t)) = E [N (M (t) , q(t))} £ feeK k l k . 
Then it is reasonable to assume that the operator can rather 
accurately characterize the expected number of incoming users 
E [N(M(t), q(t))] through long-term observations. Thus the 
demand function D(M(t), q(t)) is known by the operator. 
We further assume that the instantaneous demand is upper- 
bounded as A(t) < A max for all t, and that the demand 
function D(t) is non-negative and non-increasing function of 
the price q(t). When the price is higher than some upper- 
bound, i.e., q(t) > q max , the demand function D(t) will be 
zero. The optimization of 0(t) and q(t) based on the demand 
function will be further discussed in Section 15.2.11 

4. For example, M(t) can be users' willingness to pays, or whether the 
system is in peak time or off-peak time. 
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3.6 Queuing dynamics 

Since we focus on the profit maximization problem in this 
paper, we will take a simple view of the network and model 
users' dynamic arrivals and departures as a single server queue. 
When a user accesses the network, the corresponding file 
will be queued in a server at the base station, waiting to 
be transmitted to the user according to the First Come First 
Serve (FCFS) discipline. Shama and Lin in |29l showed that 
the single server queue model is a good approximation for 
an OFDM system, especially when the number of users and 
channels are large. 

We denote the queue length (i.e., the backlog, or the 
number of all packets from all queued files) at time t as Q(t). 
Thus the queuing dynamic can be written as 

Q(t + 1) = (Q(t) - r(t)) + + 0(t)A(t), (8) 

where (a) + = max(a, 0), rit) and A(t) are the transmission 
rate and incoming rate at time t, and 0(t) is the binary demand 
control variable (i.e., 0(t) = 1 means the operator admit the 
users' transmission requests at time t). Throughout the paper, 
we adopt the following notion of queue stability: 

1 * _1 

Q = lim sup- VE [Q(t)] < oo. (9) 

1 T = 

4 Problem Formulation 

For notation convenience, we introduce several condensed 
notations and use them together with the original notations. 

We define <j>{t) = (M(t), h(t), C l (t)) as observable pa- 
rameters, including the market state M(t), channel conditions 
(vector) h(t), and the leasing price C l (t) in the spectrum 
market. Based on previous assumptions, 4>(t)'s are i.i.d over 
time and take values from a finite set $. 

We define j(t) = (0(t),q(t),C s (t),B s (t),B l (t),P{t)) as 
decision variables, including the demand control variable 
0(t), the transmission price for users q(t), the sensing cost 
(with the corresponding sensing technology) C s (t), the set of 
sensing channels B s (t), the set of leasing channels B (t), and 
power allocations (vector) P(t) of the operator. We assume 
that j(t) takes values form a countable (finite or infinite) set 
Tj,(t), which is a Cartesian product of the feasible regions 
of all variables, i.e., non-negative values satisfying constraints 
(H), ©, and @. With the condensed notations, functions in 
this paper can be simply represented as functions of -f(t) with 
parameter <fi(t). 

We further define the instantaneous profit in time t 

i?(f)^i?( 7 (t);^(t)) 

± q(t)0(t)A(t) - C s (t)\B s (t)\ - C\t)\B\t)\. (10) 
The time average profit is denoted as 

1 t " 1 

JJ = limsup-5^E[i2(*)]. 

All expectations in this paper are taken with respect to system 
parameters <j>(t) unless stated otherwise. 

We look at the profit maximization problem through pricing 
determination and resource allocations. At the beginning of 



each time slot t, the operator observes the value of <fi(t) and 
makes a decision -f(t) to maximize the time average profit, 
subject to the system stability constraint (fTTT i and the collision 
upper-bound requirement (fTZt . The Profit Maximization (PM) 
problem is formulated as 

PM: Maximize R 



(11) 

max ? 

(12) 



Subject to Q < oo, 

X~i < r]i,i £ B. 
Variables £ r 0(t) ,Vt, 
Parameters 0(f), Vi. 

We represent its optimal solution as 7*(t) = 
(0*(t),q*(t),C s *(t),B s *(t),B l *(t),P*(t)), and denote 
R as the maximum profit. The PM problem is an infinite 
horizon stochastic optimization problem, which is in general 
hard to solve directly, especially when the distribution of 
dynamic parameter <f>(t) is unknown. For example, the future 
leasing price is hard to predict due to the dynamic supplies 
and demands in the market; and the primary users' activities 
can not be estimated precisely before hand. 

5 Profit Maximization Control Policy 

Now, we adopt Lyapunov stochastic optimization technique to 
solve the PM problem. 

5.1 Lyapunov stochastic optimization 

We first introduce a virtual queue for constraint (Q~2}, and then 
derive the optimal control policy to solve the PM problem 
through the technique of drift-plus-penalty function minimiza- 
tion GO). 

We denote Zi(t) as the number of collisions happening 
in sensing channel i £ B^. The counter Zi(t) can be 
understood as a "virtual queue", in which the incoming rate is 
Xi(t), and the serving rate is rji (the collision tolerant level). 
The queue dynamic is 

Zi(t + 1) = (Zi{t) - m) + + Xi(t), (13) 



with Zi(0) = 0. By this notion, if the virtual queue is stable, 
then it implies that the average incoming rate is no larger than 
the average serving rate. This is just the same as the collision 
upper-bound constraint (fT2t . 

We introduce the general queue length vector 0(i) = 
{Q(t), Z(t)}. We then define the Lyapunov function 



L{®(t)) = \[Q{t? 



and the Lyapunov drift 

A(0(t)) = E \L(&(t + 1)) - L(&(t))\@(t)] . (14) 

According to the Lyapunov stochastic optimization tech- 
nique, we can obtain instantaneous control policy that can 
solve the PM problem though minimizing some upper bound 
of the following drift-plus-penalty function in every slot t: 

A(®(t)) -VE[R(t)\&{t)}. (15) 

There are two terms in the above function. The first term is 
the Lyapunov drift defined in ( TBI . It is shown by Lyapunov 
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stochastic optimization [30] that we can achieve the system 
stabilities (i.e., constraints (fTTT i and ( fT2] i of the PM problem) 
by showing the existence of a constant upper bound for the 
drift function. The second term in (Q3) is just the objective of 
the PM problem, i.e., to minimize the minus profit, which 
is equivalent to maximize the profit. Here parameter V is 
introduced to achieve the desired tradeoff between profit and 
queuing delay in the control policy. We first find an upper 
bound for (fT3T >. 

By the queue dynamic ©, we have 

Q(t + l) 2 < (Q(t)-r(t)) 2 +A(t) 2 + 2Q(t)0(t)A(t) 

= Q{tf+r{tf + A(t) 2 +2Q(f)(0(f)A(f)-r(f)) . (16) 

Similarly, for virtual queue dTTt . we have 

Z l {t + 1) 2 < Z l (t) 2 +r 1 2 +X l {t) 2 + 2Z l (t)(X l (t)- Vl ). (17) 
Substituting ( [ToT l and (fTTI i into (15[ . we have 

A(0(f)) - VE [R(t)\@(t)} < D — Di(t) 

- V^E [q(t)0(t)A(t)-C s (t)\B s (t)\-C l (t)\B l (t)\\&(t)] 

+ Q(t)E [0(t)A(t) - r(f)|0(f)] + Z ^) E [*i(*)l®(*)] 

where I? is a positive constant satisfying the following condi- 
tion for all f, 



r(t) 2 HO(t)A(t)) 2 \@(t)]+jy[Xi(t) 2 +r,f\&(t) 



and Di(t) = J2icB s Zi(t)r)i is a known constant at time f, 
since the values of Zi(t)s are known at time t. 

To further simplify the above expression, we introduce two 
new notations: channel selection B(t) = B s (t) U B l (t), and 
channel cost 



rM A\C'(t), if i€B s (t), 
l[) ~\C l (t), if iGB l (t), 



(18) 



where C s (t) is the virtual sensing cost and is defined as 
C*(f) = C s {t) + (l/V)Zi(t)E[Xi{t)\@{t)}. Note that this 
virtual sensing cost depends not only on the sensing cost but 
also on the collision history in this channel. More frequent past 
collisions in this channel will increase the virtual sensing cost, 
hence makes the operator more conservative about choosing 
this sensing channel. It then follows: 

A(0(f)) - VE [R(t)\@(t)] <D- £>i(f) 

\9^L- q (Ao{t)A{t) &(t) 



VI 



iGB(t) 



Q(t)ri{t) 



V 



0(f) 



(19) 



where we use the fact that collisions between secondary and 
primary users can only happen in channels that are chosen 
for sensing, i.e., i G B s (t). Next we propose the Profit 
Maximization Control (PMC) policy to minimize the right 
hand side of inequality ( fT9l for each time f. 



5.2 Profit Maximization Control (PMC) policy 

It is clear that minimizing the right hand side of (fT9l is 
equivalent to minimizing the last two terms in ( fT9l . Note that 
the last two terms are decouple in decision variables, thus we 
have the two parallel parts in the PMC policy as follows: 

5.2. 1 Revenue Maximization 

Here we determine two variables: the transmission price q(t) 
and the market control decision 0(t). The optimal transmis- 
sion price q(t) is obtained by solving the following revenue 
maximization problem: 



Maximize q(t)D (q(t),M(t))- 
Variables q(t) > 



Q(t) 
V 



D(q(t),M(t)) (20) 



To obtain the above problem formulation of revenue 
maximization, we use the fact that the demand function 
D(M(t),q(t)) = E [A(t)], which is independent of the queu- 
ing states of the system. 

Note that the first term in (l20l is just the revenue that 
the operator collects from its users. The second term can be 
viewed as a shift of the queuing effect, which is introduced 
by the Lyapunov drift for system stability. 

If the maximum objective in < f2Qb (under the optimal choice 
of q(t)) is positive, the operator sets the demand control 
variable 0(f) = 1 and accepts the present incoming requests 
A(t) at the price q(t). Otherwise, the operator sets O(f) = 
and rejects any new requests. 

5.2.2 Cost Minimization 

We determine channels selection 6(f), sensing technology 
(or cost) C s (f), and power allocation P(i), by solving the 
following optimization problem to control the costs of the 
operator to provide transmission services to its users. 

Minimize ^ ~ h(f)|e] (21) 

ieB(i) 

Subject to O,©,® 
Variables C s (t) , B s (f ) , B l (f), P< (f) > 

To obtain the above problem formulation of cost minimiza- 
tion, we use the fact that Xj(f) = (1 - S l {t))W l {t), which 
is independent of the queuing state. Thus the virtual sensing 
cost can be updated as C s (t) = C s (t) + {l/V)Z l {t){\ - 
Po)Pmd{C s (t)), which increases with the virtual queue and 
missed detection probability. 

Note that the first term in the summation in (l2"TT i is the 
cost of each channel. The second term in the summation is a 
queuing-weighted expected transmission rate, again is a shift 
introduced by Lyapunov drift for system stability. This shift 
can be also viewed as the "gain" collected from the channel 
to help clear the queue. 

5.2.3 Intuitions behind the PMC policy 

We discuss some intuitions behind the PMC policy. To 
maximize the profit, the operator needs to perform revenue 
maximization and cost minimization. To guarantee the queuing 
stability, some shifts (i.e., all queue-related terms) are intro- 
duced by the Lyapunov drift in these problems. In Appendix 
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Section [D] we show that the queueing effect will increase the 
optimal price announced by the operator (comparing with not 
considering queueing), as a higher price will reduce the users' 
demands and maintain the system stability. 

The Lyapunov stochastic optimization approach provides 
a way to decompose a long-term average goal (e.g., the 
PM problem) into instantaneous optimization problems (e.g., 
revenue maximization and cost minimization problems in the 
PMC policy). In the stochastic optimization problem, the 
current decisions always have impacts on the future prob- 
lems. These impacts are characterized and incorporated by 
the queueing shift terms in the instantaneous optimization 
problems. Therefore, we can achieve the long term goal 
through focusing on the instantaneous decisions in every time 
slot. The flowchart for the PMC policy is illustrated in Fig. |2] 



Initiation: Q(f) = and Z,(t) = 



£ 



± 



Updating: 

t «- t+ 1 
Update queues Q(t) and 
Zj(t) by (8) and (13) 



Profit Maximization Control (PMC) Policy: 



Revenge 
Maximization: 

Determine price p*{t) 
and market control 
strategy 0*(t) by 
solving (21) 



Cost Minimization: 

IsUtaga 

Sensing cost selection & 
channel selection: 

Compute C s *(t), B s *(t), 
,5 i+ (t) by Algorithm 1 

sensing results Wi(t) 



2 nd stage 



Power allocation: 

Compute PC {t) by 
Algorithm 3 



Fig. 2. Flowchart of the dynamic PMC policy 

Although the revenue maximization problem is relatively 
easy to solve, the cost minimization problem is very com- 
plicated. It is actually a two-stage decision problem. In the 
first stage, the operator determines the sensing technology, 
and chooses which channels to sense and which channels to 
lease. Then spectrum sensing is performed to identify available 
channels. With this information, the operator further allocates 
downlink transmission power in the available channels (sensed 
idle ones and leasing ones). In Section 15.31 we focus on 
designing algorithms to solve the cost minimization problem. 

5.3 Algorithms for Cost Minimization Problem 

Now we use backward induction to solve the cost mimmation 
problem. 

5.3. 1 The Second Stage Problem 

We first analyze the power allocation in the second stage, 
where the sensing results Wi(t), the channel selection B s (t), 
and the sensing technology C s (t) have been determined. 
Therefore, the power allocation problem of (|2T1 is as follows: 

Maximize V] uj^t) log (1 + hi(t)Pi(t)) (22) 

i£B(t) 



Subject to 2^ Pi(t) < P n 
Variables P t {t) > 



where 

, x A U= E [Si(t)\Wi(t) = 1], if i eB'(f), 

l{ ' \l, if i £B l (t). 
and we can calculate 
E[St{t)\Wi(t) = 1] 

= Pa (l-P f g(C S (t))) 

P0(1 - Pfa(C°(t))) + i.l-po)Pmd(C s (t))- 

By using the Lagrange duality theory, we can show that the 
problem (l22l has the following optimal solution 

^)(m-^mm) + teB ^ (25 ) 

* i Bit), 

where A(t) is the Lagrange multiplier of the total power 
constraint ©. The optimal value of A(t) is the following water 
filling solution, 



(24) 



A(t) 



E 



Pn 



(26) 



'i£B p (t) hi{t) 

where B p {t) = {i e B{t) : P t (t) > 0}. Note that @ is a 
fixed-point equation of X(t), and the precise value of X(t) is 
not given here. 

When the values of all parameters (i.e., hi(t),u)i(i)) are 
given, we can use a simple water level searching Algorithm Q] 
and similar as the searching algorithms in ll32l . Il33l ) to 
determine the exact optimal value of X(t). In the following 
pseudo code of Algorithm Q] we define a function A(m) as 
follows: 



A(m) 



EI 



Pn 



Em J_ ■ 
i=l hi 



Algorithm 1 Power Allocation 



Rearrange the channel indices i g B n 

order of LJi(t)hi(t) 

m <- \B(t)\, X <= A(m) 

while A > h m (t)uj m (t) do 

m <— m — 1 

A <= A(m) 
end while 



as a decreasing 



The main complexity in this algorithm is to sort the channels 
according to the channel gains. We can adopt established sort- 
ing algorithms |34| to obtain the index rearrangement with a 
complexity 0(|£> m ax| log(|£> max |)). Thus the total complexity 
of Algorithm □ is G(|S max | 3 log(|B max |)). 

5.3.2 The First Stage Problem 

Let us consider the first stage problem to determine the sensing 
technology, the sensing set, and the leasing set. Note that since 
the sensing has not been performed at this stage yet, thus the 
sensing result W%(t) is not known. We denote 



A 



a (t) = l a ° ^(WWl if 1 e BS W 
" iU 1l if ieB'(t), 



and we can calculate 

E[Si(t)Wi(t)] 



P0(l-Pfa(C S (t))) 



(27) 



(28) 
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Substitute the optimal power allocation (f23T > into the prob- 
lem d2ll . we have 



Minimize C»(i)- 

iGB(t) 



Q(*) 



at (*) lo 



(29) 



V A(t) 

Subject to 6 s (t) C B^, B'(t) C 2&„, C s (t) G C 

Variables B s (t) , B l (t) , C s (t) 

We first consider the above problem for a fixed sensing cost 
C s (t). This problem is a combinatorial optimization problem 
of B s (t) and B s (t). The worst case of searching complexity 
(i.e., exhaustive searching) can be 0(2l 8max l), exponential in 
the number of total channels. 

However, we can reduce the complexity by exploring the 
special structure of this problem. 

Proposition 1: (Threshold Property) 

• We rearrange the leasing channel indices i G B l max in the 
decreasing order of gi(t), which is defined as 

Ci(t)\ 



(t) = ft.j(t)exp 



Q(t) 
v 



(30) 



There exists a threshold index i th , such that a channel i is 
chosen for leasing (i.e., i G B l (t)) if and only if i < i\ h . 
We rearrange the sensing channel indices j G £>n iax in 
the decreasing order of gj(t), which is defined as 



gj(t) = u hj(t)exp 



(31) 



For all leasing channels j G £>* lax , there exists a threshold 
index jf h , such that a channel j is chosen for sensing (i.e., 
j G £"'(£)) if and only if j < jf h . 

Proof: For each channel in the optimal channel selection 
set i G B*(t), it satisfies the following condition 

c i ®<Q%{tW Ui{t)hi{t) 



Maximize (log 



V-TV A(i) 
This result is easy to see from the objective function in 
to optimize the profit, we should only pick the channel with 
its cost no larger than its gain. Thus by (l30t and (|3H . the 
optimization problem in d29l can be written in the following 
equivalent form: 

j^v^umy (33) 

Subject to 8 s (t)Cfi; w B'CQCB^ 

Variables C* s (£), # s (i), B l (t) 

Thus by the log function in the objective of d33l . the threshold 
property immediately follows. □ 
This proposition suggests that we should select the channel 
with a large gi (for leasing channels) or gj (for sensing 
channels). Note that as defined in (f30b and PIT ), gi and gj 
are equal to channel information (i.e., hi for leasing channels, 
u)jhj for sensing channels) multiplying a decaying factor 
related to the channel cost. They can be understood as virtual 
channel gains by taking channel costs into consideration. A 
large value of gi or gj means that the channel is cost-effective, 
i.e., the channel has a good channel gain as well as a low cost. 



By Proposition [TJ it is clear that we can obtain the optimal 
channel selection by an exhaustive search of the optimal 
sensing and leasing thresholds. Algorithm [2] gives a pseudo 
code for the searching procedure. 

Algorithm 2 Optimal Channel Selection (for a given C s (t)) 
l: procedure Computing B(C s (t)) 

2: invoke procedure SearchingThreshold(C s ) to calculate 
T ( and X s 

3: U(C s (t))^0 

4: for i = 0, 1 . . . , T ; do 

5: for j = 0, 1 . . . , T s do 

6: Calculate X(t) as ([25} with B p (t) =B l t U B] 

7: if g t {t) > A(t) and gj(t) > \(t) then 

8: Calculate U(i,j) 

9: if U(i,j) < U(C s (t)) then 

10: U(C s (t))^U(i,j) 

li: B(C s (t)) ^B\\JB] 

12: end if 

13: end if 

14: end for 

15: end for 

16: end procedure 

In Algorithmic U(i,j) denotes the optimal value of (1331 
with the channel selection set B = Bl U 6?. To decrease the 
number of searching loops, we can first run Algorithm q]( in 
Appendix to determine the maximum possible thresholds 
T' for leasing channels or T s for sensing channels. (If we do 
not run Algorithm , we can just set T' = |£>maxl an d ~*~ s = 
£>f nax j . Whether we run Algorithm |5] or not, the complexity 
of Algorithm |2] is no worse than 0(|£>f nax | x |i3j, lax |).) Thus 
the searching complexity is reduced to 0(|£> max | 2 ), given the 
channel indices are rearranged as in the Proposition [T] We can 
adopt established sorting algorithms [34] to obtain the index 
rearrangement with a complexity 0(\B max \ log(|£> max |)). Thus 
the total complexity of finding the optimal channel selection 

is 0(\B max | 3 l0g(|B m ax|))- 

Note that in real systems, the channel conditions and the 
leasing cost may not change as frequently as every time slot. 
We usually can update these network parameters every time 
frame (which is composed by several time slots instead of 
one time slot). Accordingly, the above algorithm will also be 
operated based on the time frames, which will greatly reduce 
the computation complexity in practice. 

Furthermore, let us find the optimal sensing cost C s (t) by 
enumerating all possible sensing costs C s (t) G C s . For the 
sensing cost C 8 {t), we denote the objective value in (1331 as 
U(C s (t)) and the optimal channel selection set as B(C s (t)). 
The corresponding pseudo code is given in Algorithm [3] the 
complexity of which is 0(\C\ x |B max | 3 log(|S max |)). 

So far, we have completely solved the two-stage optimiza- 
tion problem in (1211 . For each time t, the operator first runs Al- 
gorithm [3] to choose the channel sets B*{t) = B s *(t)UB l *(t). 
Then it uses the sensing technology with a cost C s *(t) to 
sense channels in B s *(t). Based on the sensing results, it 
further runs Algorithm [T] to determine the power allocation 

P*(t),ieB*(t). 
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Algorithm 3 Optimal Sensing Cost and Channel Selection 

l: U* <- 

2: for C s {t) g C s do 

3: Determine the optimal channel selection B(C s (t)) (see 

Algorithm 01 
4: Calculate U(C s {t)) 
5: if [/* > U then 

6: J7* <- C/, C s *(t) «- C s (t), B*(t) «- £(C s (t)) 

7: end if 
8: end for 



where R is the optimal value of the PM problem. 

According to the Little's law, the average queuing delay 
is proportional to the queue length. Thus users experience 
bounded queuing delays under the PMC algorithm by ( l34l >. 
By ([36]l, we find that the profit obtained by the PMC Policy 
can be made closer to the optimal profit by increasing V. 
However, as V increases, the queuing delay also increases as 
shown in Q41 l. The best choice of V depends on the desired 
trade-off between queuing delay and profit optimality. 

A detailed proof of Theorem[T]is provided in Appendices lAl 
andE 



5.3.3 Sensing vs. Leasing 

We are also interested in how the PMC policy makes the best 
tradeoff between sensing and leasing based on the sensing cost 
C s (t) and the leasing cost C l (t). To make the comparison easy 
to understand, we will consider perfect sensing with no sensing 
errors {i.e., ljq = 1 and «o = Po)- We will further assume 
that a leasing channel i and a sensing channel j have the 
same channel gain = hj(t). Finally, we assume that two 
channels have the same availability-price-ratio, i.e., the costs 
satisfy C s (t) — aQC l (t). We want to answer the following 
question: is the PMC policy indifferent in choosing either of 
the two channel? 

By d30l ) and d3Tl ), we have g^ = gj for these two channels. 
By d33l ), we can calculate the net gains by channel i and 

j: (log (fffi)) > a o(log(f^)) • To maximize the 
objective in ( 1331 1. it is clear that PMC policy will prefer the 
leasing channel i over the sensing channel j, and this tendency 
increases as cto decreases. If we view the channel unavailabil- 
ity (1 — ao) as the risk of choosing the sensing channel, then 
the PMC policy is a risk averse one. This is mainly due to 
the concavity of the rate function. This preference order will 
also hold in the imperfect sensing case, in which case we will 

have 9l > g 3 and (log (fg) ) > a (log (fg>) ) + . 

5.4 Performance of the PMC Policy 

We can characterize the performance of the PMC Policy as 
follows: 

Theorem 1: For any positive value V, the PMC Policy has 
the following properties: 

(a) The queue stability (fTTT > and collision constraints (fT2b are 
satisfied. The queue length is upper bounded by 

Q(t) < Q max 

Vq 

max T-fimaxj Vt; (34) 
and the virtual queue length is upper-bounded by 

Z t (t) < Z max = n(Vq max + A max ) + 1, Vi, t. (35) 
where 

K = 



„Ar maxPo (l-P /a (C s0 ))) 

(1 - Po)Pmd(C s0 ) 



and C*o A max po(j^g» 

(b) The average profit Rpmc obtained by the PMC policy 
satisfies 

inf RT^ > R* - 0(1/V), (36) 



5.5 Extension: More General Model of Primary Ac- 
tivities 

In the previous analysis, we have assumed that primary 
users' activities in each sensing channel follow a simple i.i.d. 
Bernoulli random process. Next we will show that the PMC 
policy can be easily adapted to the more general Markov 
chain model of the primary users' activities shown in Fig. [3] 
In this model, for any time t, Si(t) is unknown, but the 
history information Si(t — 1) is known, and also the transition 
probabilities Pr(S t (t) = s'\S t (t - 1) = s) = p l s ^ s ,,s £ 
{0,1}, s' S {0,1}, i € B£j ax are known from long-time 
statistics. 



Pi-o 



Po- 



Fig. 3. Markov chain model of the PUs' activities 

All previous analysis for PMC policy will still hold if we 
update two parameters (Ji(t) and <Xi(t) as follows: 

r .v a IE [Si(t)\Wi(t) = 1, Si(t - 1)], if i e B s (t) 
jl, if ieB l (t) 



where 



E[S i (t)\W i (t) = l,S i (t-l) = s] 

PUl(l -Pfa(C*(t))) 



PUl(l-^/a(C-(t))) +p s ^oP md (C s (t))' 



and 



E [Si(t)Wi(t)\Si(t — 1)] if i£B s (t) 
1 ifieB l (t) 



where 



E [Si(t)Wi(t)\Si(t - 1) = s) = pU.il - P fa (C s (t)))- 
There is no change in the revenue maximization part, 
and the cost minimization part still involves a combinatorial 
optimization problem. But the complexity of solving cost 
minimization problem becomes 0(\C\ x 2\ B ™^\\B l msi x + 1|)> 
since we lose the structure information in sensing channels, 
i.e., the threshold structure does not hold for sensing channels. 
In the worst case (B max = B^ ax , B l max = ), it comes back 
to 0(\C\ x 2l z3max l), which is the complexity of the exhaustive 
search without considering the threshold structure. 
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Fig. 4. Heterogeneous user model: Users in a hexagons 
are nearby homogeneous users, who have the same 
channel experience. Users in deferent hexagons can have 
different channel experience. 



Let us further consider a special Markov chain model where 
the transition probability for each sensing channel is the same, 
i.e., Pr(Si(t) = s'\Si{t-l) = s) = jw, Vi e B s max . In this 
model, all sensing channels can be categorized into two types, 
channels being busy in the last slot (i.e., Si(t — 1) = 0), or 
channels being idle in the last slot (i.e., Si(t — 1) = 1). We 
can still show threshold structures for both types. Thus the 
complexity is reduced to 0(\C\ x |S max | 4 log(|B max |)). 

The above analysis shows that it is critical to exploit the 
problem structure to reduce the algorithm complexity. 



6 Heterogeneous Users 

In Section |4j we adopt the single queue analysis for homoge- 
neous users who are assumed to be located nearby and have 
the same channel condition on each channel. However, the 
single queue analysis no longer works for a more general 
scenario of heterogeneous users, where users can be located 
at different places, and have different channel conditions. In 
this section, we introduce the multi-queue model to deal with 
the heterogenous user scenario as shown in Fig. |4] 

We divide the total coverage of the secondary base station 

into J = {1,2,..., J} disjoint small areas (illustrated as 
hexagons in Fig. [4]i according to users' different channel 
experiences. Users in one of these small area are nearby 
homogeneous users. They share the same channel conditions, 
and form a queue based on the FCFS discipline. We use Qj (t) 
to denote the queue length in area j. Since the queue and the 
corresponding area is one-to-one mapping, we also call the 
users in area j as queue j users. 

For each queue j G J , hij(t) represents the users' channel 
gain to channel i £ £> max at time t, which follows an i.i.d 
distribution over time. The indicator variable € {0,1} 

denotes the operator's channel assignment at time t: 
Tij(t) = 1 if channel i is allocated to queue j, and Tij(t) = 
otherwise. Meanwhile, the assignment Tjj must satisfy 

5^2tf(t)<l,Vt. (37) 

The power allocation for queue j on channel i is denoted 
by Pij(t). The total power allocation must satisfy 



E 



Thus the rate of queue j € J can be calculated 

h tJ {t)P l3 {t) 



rv(i)= E Mt)2} 3 -(*)l°g(l 

iGB(t) 



(39) 



where Ii(t) is the transmission result in channel i, following 
the same definition in Section 13.41 

For each queue j, we follow the same demand model as in 
Section [331 for homogenous users. We assume the number of 
incoming users, the market state, and the user's instantaneous 
demand are i.i.d among different queues, and denote them 
as Nj(t), Mj(t), and Aj(t) respectively. The market control 
variable and price for queue j are denoted as Oj(t) and qj(t). 
The queuing dynamic for queue j is as follows: 

Qj (t + 1) = (Qj (t) - Tj (t))+ + Oj [t)Aj (t). (40) 
Thus the homogeneous user model in Section [4] can also be 
viewed as a special case of the heterogeneous user model, 
where is a singleton. 

6.1 Multi-queue Profit Maximization Control Policy 

6.1.1 Revenue Maximization 

For any queue j € J, we compute the optimal transmission 
price qj(t) by solving the following problem. 

qj(t)-Q^jD( q j(t),Mj(t)) (41) 

Variables qj (t) > 

If the maximum objective in (HTI) is positive, the operator sets 
transmission control variable 0*(t) = 1 and accepts users' 
new file download requests at the price q*(t). Otherwise, the 
operator sets 0*(t) = and rejects any new requests. 

6.1.2 Cost Minimization 

We solve the following optimization problem to determine 
sensing cost and resource allocation: 



Maximize 



V 



E[rj(t)] 



(42) 



.Vt, 



(38) 



Minimize d (t) - ■ 
ieB(t) jej 
Subject to 0, ©, @g) > 03 

Variables C s {t), B s {t), B\t), P zj (t) > 0,T 4j (t) £ {0, 1} 

where the cost Ci(t) follows the same definition of ( fT~8b . 
Similar to the homogenous model in Sectior f531 it is a two- 
stage decision problem. In the first stage, we determine the 
sensing technology, and choose the sensing channels and the 
leasing channels. Then in the second stage, we determine the 
channel assignment and power allocation based on the sensing 
results. We use backward induction to solve this problem (l42l . 
To simplify the notation, we will ignore the time index in the 
following analysis. 

We first analyze the channel assignment and power allo- 
cation in the second stage. In this stage, since the sensing 
results Wi, the channels B s , and the sensing technology C s 
are determined, the second stage problem of (|2H is shown as 
follows: 

Maximize QjLOiTij log I 1 

jeJteB ^ 
Subject to (|38 | .([37 l) 
Variables Pj > 0, T VJ G {0, 1} 



h P 

71, 



(43) 
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where uji follows the same definition of (f23T >. 

Compared to the power allocation problem in (l2Zt . the 
binary channel assignment variables Ty's make the second 
stage problem in d43l > much more complex. 

We first solve problem ((43) assuming fixed T^'s, in which 
case the power allocation problem is a convex optimization 
problem. Following the same method of solving power allo- 
cation for homogeneous users as in ( |25] l, we have 



P% j — T{ j Qj^i 



1 



1 



, i G B. 



(44) 



A QjLUihij 

where A is the Lagrange multiplier of the total power constraint 
(l38T l. which satisfies 

A = eBp (45) 



rmax ~r 2^i£B p h t (t) 

where B p = {i G B : Pij > 0}. When Tjj is known, we can 
design a simple search algorithm similar to Alg.[T]to determine 
the optimal value of A. 

We then substitute this result in d43l . and further maximize 
the objective over T^'s. Then we have 



Maximize /Jwt Qj^j 
ieB je>7 

Subject to < 1 



log 



A 



Variables T t] G {0,1} 
For each channel i G B, let us denote the set 

L^i Qj hij 



J { =argmaxQj (log 



A 



(47) 



Here the solution J* is a set of indices of the chosen queues. 
If J* is a singleton, then we denote its unique element as j*. 
If J* is not singleton, since all elements in J* lead to the 
same value in objective, then we can randomly pick one of 
the its element and denote it as j*. 

Since the objective in problem d46b is linear in X^, it is 
easy to see the optimal solution is 



Ti, 



(48) 



|1, if./ .//•'• B 
I 0, otherwise. 

Now let us consider how to calculate the value of A and 
j*. By d43T > and d47| i, we find that they are actually coupled 
together. To determine A in ((45]), we need to know (or 
equivalently J* , j*), i.e., which queue is chosen for which 
channel. But determining J* in d4Tb requires the value of 
A. One way to solve this problem is to enumerate every 
possible channel assignment combinations to find the solutions 
satisfying both d45l ) and d47l i. Since each channel i G B max can 
be assigned to one of J queues, there are a total of £> max | J 
channel assignment combinations. When the J or |£> max | is 
large, the complexity can be very high. However, we can 
reduce the search complexity by exploring the special structure 
of the problem. 

Property 1: For each channel allocated positive power i G 
B p , we have 

J* = argmax{Qj | h l0 > — ^— , j G J}. (49) 



This property comes from (PTTV This means that for a 
particular channel i, if the channel gain for the longest queue 
Qj is good enough (i.e., hij > q u ), we should assign 
channel i to queue with the longest queue length. If J* is a 
singleton, then we denote its unique element as j*. Otherwise, 
we denote J* = arg max {hij \j G J*}- In this case, there 
are multiple channels with the same channel gain and the 
same queue length, and we can randomly pick one and denote 
it as j*. Thus we can search A and J* (and also j*) by a 
simple greedy algorithm as follows. First, for all channels, we 
assume J* = argmaxjgj- Qj, and calculate the value of A 
by the waterfilling algorithm (as the procedure "Waterfilling" 
in Alg. |U). For each unchosen channel, i.e., the channel i 
with LdiQj*hij* < A, we check whether ojiQjhij > A can be 
satisfied when another queue is chosen instead of j*. If there 
is some set Ji of queues satisfying ujiQjhij > A, we replace 
J* with the one with the longest queue length in this set, 
i.e., argmax jg j Qj. We repeat the process iteratively until 
we find the A and j* that satisfies both gSJ and (07]). The 
pseudo code is given in Alg. 0] To simplify the expression of 
Alg. 01 with a little abuse of notations, we denote hi = hij*, 



(46) Qi = Qj*, and A(m) 



Algorithm 4 Channel Assignment 



l: <- argmaxjgj Qj 

2: procedure WATERFILLING(/li(£), Qi(t)) 

3: Rearrange the channel indices i G B max in the de- 
creasing order of u>i(t)Qi(t)hi{t) 



m <- \B(t)\, A <= A(m) 
while A > h m {t)uj m (t) do 
m <— m — 1 
A <= A(m) 
end while 
end procedure 

while uji(t)hij(t)Qj{t) > A,Vj, Vi > m, do 
J* <- arg max {Qj\ui(t)hij(t)Qj(t) > A} 
invoke procedure Waterfilling( hi(t),Qi(t)) 

end while 



The complexity of Alg. 0] is 0(\B max \ 3 log(|S max |), since 
the while loop runs no more than |£> max | times in the 
worst case, and the complexity of waterfilling part is 
0(|<B ma x| 2 log(|£> m ax|) (the same as the waterfilling power al- 
location algorithm in Section |5. 3. 11 1. Compared to the exhaus- 
tive search, the complexity of solving the channel assignment 
is greatly reduced. 

With the solution of channel assignment, we can update the 
power allocation solution of (|42| | as 




uJiQjh, 



if j =j*,i e B, 
otherwise, 



where the value of A, hi and Qi are calculated by Alg. 0] 

After solving the second stage problem, we move to the first 
stage. Following the channel assignment in (08]), we find that 
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the first stage problem for the heterogeneous users is the same 
with the one of homogeneous users problem in (|29i l. We can 
simply run the same Alg. [3] to determine sensing technology 
C s *(t), sensing channel set B s *(t) and leasing channel set 
B'*(t). 

6.2 The Performance of M-PMC Policy 

Next we show the performance bounds of the M-PMC Policy. 
The proof method is similar to that of Theorem [TJ and the 
details are omitted due to space limit. 

Theorem 2: For any positive value V, the M-PMC Policy 
has the following properties: 

(a) The queue stability (JTTJ and collision constraints ( TT2T > are 
satisfied. The queue length is upper bounded by 

A 



Collision situation of all sensing channels (V-100) 



Qj{t) < Q max = Vq max +A max : 
and the virtual queues are bounded by 



Vt, 



(50) 



Zi(t) < Z r 



A 



A 



K (y q ma * + A max ) + 1, Vi,t. (51) 

po(l-P fQ (C 3 (t))) + (l-po)P md (C 3 (t)) 
(1-Po) 



where k = r max p 
and C s0 denotes the highest sensing cost 



(b) The average profit Rm—pmc obtained by the M-PMC 
policy satisfies 

inf Rm-pmc >R* -0(1/V), (52) 

— * 

where R is the optimal value of the multi-queue PM 
problem. 

7 Simulation 

In this section we provide simulation results for PMC and 
M-PMC policies. 

We conduct simulations with the following parameters. The 
number of incoming users in each slot satisfies a Poisson 
distribution with a rate D(q(t),M(t)) = j^(q(t)-5) 2 . The 
market state M(t) satisfies Bernoulli distribution, M(t) = 1 
with probability 0.5, and M(t) = 2 with probability 0.5. The 
file length of each user satisfies the i.i.d. (discrete) uniform 
distribution between 1 and 10. There are 32 channels in total. 
20 of them belong to the sensing band B^ ax , and the rest 
12 channels belong to the leasing band B max . The primary 
collision probability tolerant levels are set as rji = 0.001 
for sensing channel i = 1,2,..., 10, and rji = 0.005 for 
sensing channel i = 11, 12, . . . , 20. The channel gain hi of 
each channel satisfies i.i.d. (continuous) Rayleigh distribution 
with parameter a = 4.5 The total power constraint of the 
base station is p max = 8. There are 3 different sensing 
technologies with costs C s = {0 (not sensing at all), 0.1, 0.5}. 
The corresponding false alarm probabilities are Pf a = 
{0.5,0.1,0.008}, and the missed detection probabilities are 
Pmd = {0.5,0.08,0.005}. We assume the idle time proba- 
bility of sensing band po is 0.6, and the control parameter 
V e {5,10,50,100,200}. 

Figure [5] shows a collision situation of all sensing channels 
with the control parameter V = 100. We find that primary 
users' collision tolerant bound (fT2t is satisfied as time in- 
creases. We also find that we obtain similar curves for the 
collision probabilities with other values of control parameter 
V. 



PU's collision tolerant bound 
for Channel 1-10 




1000 2000 3000 

Time t 



4000 5000 



Fig. 5. A collision situation of all sensing channels with 

V = 100 



Figure. [6] (a) shows that the average queue length grows 
linearly in V, and is always less than the worst case bound 
Yqmax _|_ j^nax _ pjg Ure [g] (b) shows that the average profit 
achieved by PMC policy converges quickly as V grows, and 
is close to the maximum profit when V > 100. 

(a) Average queue length vs. Parameter V 

-Average queue length of PMC algorithm 
-Average queue length upper bound 



50 100 150 

(b) Average profit vs. Parameter V 




Fig. 6. (a) Average queue length vs. Parameter V, (b) 
Average profit vs. Parameter V 

We further vary the idle time probability of sensing band po 
from to 1. In Fig. [7] we show the average profit with different 
sensing available probabilities po e [0, 1] and different fixed 
sensing technologies. The black curve is with zero sensing 
cost, where Pf a = P m d = 0.5, which means the operator 
does not perform sensing and takes random guesses of primary 
users' activities in sensing channels. The blue curve is with 
the low sensing cost 0.1, where Pf a = 0.1 and P m d = 0.08. 
The purple curve is with the high sensing cost 0.5, where 
P fa = 0.008 and P md = 0.005. The red curve is the 
PMC policy, which adaptively choose sensing cost from the 
above three (sensing cost C s = 0, 0.1, 0.5). When the sensing 
available probability is small (e.g., p S [0,0.2]), all strategies 
tend to only choose the leasing channels, thus all curves obtain 
similar profits. When the sensing available probability p 
further increases, the advantage of exploring sensing channels 
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Average profits with different sensing techniques 



- + -zero sensing cost 
-e-low sensing cost 
-* - high sensing cost 
—"-optimal sensing cost 




0.2 0.4 0.6 0.£ 

Sensing Available Probablity: p 



Fig. 7. Average Profit with different sensing technologies 



becomes more significant. The performances for strategies 
using the zero and low sensing cost are not good. The reason is 
that their detection accuracy is not good enough. To achieve 
the primary users' collision bounds, these strategies choose 
sensing channels less often, and replace with more expensive 
leasing channels. When the sensing available probability is 
high and close to 1, sensing seems unnecessary. Therefore, 
the performance increases as sensing cost decreases, where the 
zero cost is the best and the high cost is the worst. The PMC 
policy (the red curve) adaptively chooses the sensing cost, i.e., 
when po is medium, it utilizes the high sensing cost strategy in 
most of time slots; as po keeps increasing, it gradually changes 
to utilize the low cost and zero cost strategies more frequently; 
and when po goes to 1, it utilizes the zero sensing cost strategy 
in most of time slots. It has the best performance, since it can 
take advantage of different sensing technologies for different 
sensing available probabilities. 

For the M-PMC policy, we conduct simulations for a simple 
two-queue system. For queue- 1, the channel gain hu of each 
channel satisfies i.i.d. (continuous) Rayleigh distributions with 
parameter a = 4.5. For queue-2, the channel gain of each 
channel satisfies i.i.d. (continuous) Rayleigh distributions with 
parameter a = 5.5. This is because queue-2 users are closer 
to the operator's base station than queue- 1 users. 

Figure [8] (a) shows that the average transmission rate 
obtained by queue-2 users is higher than that of the queue- 
1 users. This is because queue-2 users usually have better 
channel conditions, and M-PMC policy prefers to allocate 
more powers to better channels to improve the transmission 
rate. Figure [8] (b) shows that the revenues obtained by the 
operator from users of two queues are almost the same when 
all queues are stable. It is an interesting observation. We can 
understand it in this way: when two queues make different 
revenue, to maximize the profit (also the revenue), the operator 
will allocate more transmission rates to the queue with a 
higher revenue. Thus the length of the queue with a higher 
transmission rate will be shortened, and the negative queuing 
effect in revenue maximization problem will be soon diluted. 
It further leads to a decreasing price and a decreasing revenue. 



In contrast, the length of the queue with a lower transmission 
rate increases, which results in an increasing price and an 
increasing revenue. Therefore, when all queues are stable 
finally, the average revenue generated by each queue is the 
same. 
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(a) Average rate 
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(b) Average Revenue 
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Fig. 8. (a) Average transmission rates of a two-queue M- 
PMC policy, (b) Average revenues of a two-queue M-PMC 
policy 



8 Conclusion 

In this paper, we study the profit maximization problem of 
a cognitive mobile virtual network operator in a dynamic 
network environment. We propose low-complexity PMC and 
M-PMC policies which perform both revenue maximization 
with pricing and market control, and cost minimization with 
proper resource investment and allocation. We show that these 
policies can achieve arbitrarily close to the optimal profit, and 
have flexible trade-offs between profit optimality and queuing 
delay. 

We also find several interesting features in these close- 
to-optimal policies. In revenue maximization, the dynamic 
pricing strategy performs the functionality of congestion con- 
trol to users' demands, i.e., the longer the queue length of 
demands, the higher price the operator should charge. In cost 
minimization, the operator is risk averse towards spectrum 
investment, and prefers stable leasing spectrum to unstable 
sensing spectrum with the same channel condition and the 
same availability-price-ratio. 

In this paper, we only looked at the issue of elastic traffic. 
It would be worthwhile to incorporate inelastic traffic, which 
usually has strict constraints on transmission rates and delays. 
Typical examples include real-time multimedia applications, 
e.g., audio streaming, Video on Demand (VoD), and Voice 
over IP (VoIP). In the most general case, we can consider 
a hybrid system with both elastic and inelastic traffic, which 
is more realistic and practical. In addition, as mentioned in 
Section [2] the literature about competition in cognitive radio 
networks mainly focus on the static network scenario. It is 
also interesting to extend our dynamic model to incorporate 
competition among several network operators. 
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Appendix A 

Proof for Theorem [T|(a) 

We first prove (l34l by induction. 

It is easy to see that at slot t = 0, no packets are in 
the network and Q(0) = 0, thus the queue length bound 
(f34b obviously holds. Now suppose d34l ) holds for time t. We 
consider the queue length bound in slot t + 1 in the following 
cases: 

> Case 1: Q(t) < Vq max , then clearly Q(t+1) < Vq max + 

A 

n max- 

• Case 2: Q(t) > Vq max , *•<?•, the objective value of the 
revenue maximization ( f20b is negative. Therefor, accord- 
ing to the optimization solution, the operator will set 
0(t) = 0, and do not accept any new request, A(t) = 0. 
Therefore, Q(t + 1) < Q(t) < Vq max + A max . 

Likewise, we can also prove the virtual queue bound (l35l l 
by induction. Suppose that the inequality (l35T l holds for time 
t, and consider the following two cases. 

• Case 1: Zi(t) < Z max — 1, clearly the virtual queue length 
bound d35l l also holds at slot t + 1. 

• Case 2: Zi(t) > Z max — 1, i.e., 



r, ,.\ ''max' 

ZAt) > max — r 



(53) 



Since r max > ^ 
then inequality (T53b implies 



Ui(t)hi(t) 



V ' \ A(t) 

(54) 

By (f32t in the PMC policy, channel i will not be 
chosen for sensing and transmission, thus there will be 
no collision in this channel, i.e., Xi(t + 1) = 0. Then by 
( fT3] l we have Zi(t + 1) < Zi(t) and the virtual queue 
length bound (f33T > also holds at t + 1. 



Appendix B 

Proof for Theorem [T|(b) 

We first construct a stationary randomized policy that can 
achieve the optimal solution of the PM problem. Let us 
consider a special class of stationary randomized policies, 
called the cf>-only policy, which makes the decision j(t) 
in slot t only depending on the observation of system 
parameter <j>(t). The stationary distribution for the observ- 
able parameter <j>(t) is denoted as {IT^, <f> G $}. (Recall 
that we define 4>{t) = (M (t),h(t), C l (t)) and j(t) = 
(0{t),q(t),C s {t),B s {t),B l (t),P(t)). Note that the value of 
<j)(t) can be chosen only from a finite set $.) In the 0-only 
policy, when the operator observes <f>(t) = cj>, it chooses 7(4) 
from the countable collection of T^,(t) — {7^,70, ■ • ■ } with 
probabilities {pj,, pi . . . }, where 2~2^Li Ps — Note that the 
decision is independent of time t, and thus is stationary. We 
have the following fact: 

There exists a stationary </>-only policy that achieves the 
optimal profit of the PM problem while satisfying stability 



L6 



condition (fTTT ) and collision upper-bound requirement (fT2t . 
which is the solution of the following optimization problem: 



R* = Maximize E H * E R ^ W 

U—l 

oo oo 

Subject to U <pH Kt?; < E ^ E ^ 



Algorithm 5 Search Threshold T ( (or T s ) for a given C s (t) 
i: procedure SearchingThreshold(C s ) 
2: Rearrange the channel indecent i € £>' nax (or i 6 
^max) as tne decreasing order of <7i(t). 



t>£<i> 



u=l 



be<s> 



u=l 



The above fact is a special case of Theorem 4.5 in l30l . The 
proof is omitted for brevity. 

Recall that the PMC policy is derived by minimizing the 
right hand side of the following inequality 

A(0(t)) - VE [R PMC (t)\®(t)} < D — VE [R(t)\&(t)} 
+ Q(t)E [0(t)A(t) -r(t)\&(t)} 
+ E Z <(*) E [X t (f)-ry. t |0(t)]. (55) 

In other words, given the current queue backlogs for each 
slot t, the PMC policy minimizes the right hand side of ((55} 
over all alternative feasible policies that could be implemented, 
including the optimal stationary 0-only policy. Therefore, by 
plugging the optimal stationary </>-only policy in the right hand 
side of ([55), we have 

A(0(t)) - VE [R PMC (t)\&(t)} <D- VR*. (56) 

Now we use the following lemma to obtain the performance 
bound in Theorem |TJb). 

Lemma 1: (Lyapunov Optimization) Suppose there are fi- 
nite constants V > 0, D > 0, such that for all time slots 
t G {0, 1, 2, . . . } and all possible values of &(t), we have 

A(0(i)) - VE [R(t)\&(t)} < D — VR*. (57) 

Then we have the following result 



1 D 

limsup-E E [- R (*)] > R * ■ 

t— too t V 

T — 



(58) 



The above lemma is a special case of Theorem 4.2 in l30l . 
The proof is omitted for brevity. 

Note that the inequality (l56l l is exact the condition (IBTt 
in Lemma Q] thus the performance bound in Theorem [Tib) 
immediately follows. 

Appendix C 

PSEUDO CODE OF ALGORITHM [5] 

Appendix D 

Impact of Queueing on the Revenue Max- 
imization Problem 

What is the impact of the queuing effect on the pricing 
in the revenue maximization problem (|20t ? Let's consider 
the following instantaneous revenue maximization problem 
without the queueing shift. 

Maximize qD(q.M). (59) 

For simplicity, we ignore the time index in the discussion. 



m <- l^maxl (° r m ' 

while A > g m {t) do 
if m > 1 then 

m <— m — 1 
A <= A(m) 

else 

break 
end if 
end while 

T ( <- m (or T 
end procedure 



IB* 



|), A <^= A(m) 



m) 



Note that both problems in (120t and (159t may have multiple 
optimal solutions. For the purpose of obtaining intuitions, we 
will restrict our discussion to the case where there is a unique 
optimal price for both (120} and (|59l . To guarantee this, we 
assume that revenue R{D) = q(D)D is a strictly concave 
function of the demancd, where q(D) is defined as the inverse 
demand function, i.e., q(D) = max{<j : D{q, m) = D}@ for 
a given m. For simplicity, we denote the optimal price in 
revenue maximization ( f20l > as q*, and the optimal price in 
revenue maximization problem (l59~t with the queuing shift as 
q**. We will show that q** > q*, i.e., the queuing effect leads 
to a higher price. 

The objective of revenue maximization problem in (|59l can 
be represented as R(D) = q(D)D, and its optimal demand 
is denoted as D*. By the first order optimality condition, 
D* satisfies that R'(D*) = 0, where R'(-) denotes the first 
order derivative of R(-). When the queuing effect is taken 
into consideration, the objective of (120b can be represented 
as R{D) — SQ, and we denote its optimal demand as D**. 
Again by the first optimality condition, D** satisfies that 
R'(D**) = & > 0. Since the revenue function R(D) 
is concave in D, R'{D) is a decreasing function. Since 
R'{D**) > R'(D*), we obtain D** < D*. Furthermore, since 
the demand D(q,m) is non-increasing in price q for a given 
m, we have q** > q*. In other word, when incorporating the 
queuing effect, the optimal dynamic price q** in the PMC 
policy is higher than the optimal price q* in instantaneous 
revenue maximization problem without the shift. Moreover, 
the larger the queue length Q, the higher the dynamic price 
q in the PMC policy. When we perform such pricing in the 
system, a high price will decrease the demand, which will slow 
the increase of the queue length. Thus the dynamic pricing in 
the PMC policy also performs the functionality of congestion 
control to some extent. 

For a more general case where the concavity assumption 
may not be satisfied, the queueing effect depends on the shape 



5. This assumption is common in the revenue management literature (e.g., 
1351 1 to guarantee unique optimal pricing. 

6. Since the demand function D(q, m) is non-increasing in q, there can be 
multiple prices resulting the same demand. 
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of the revenue function at the point D*, i.e., q** > q* if and 
only if R'(D*)<$. 



